[0.9.1][Feature]Moe alltoallv communication optimization for unquantized RL training sence & alltoallv support dpo #1547

weijinqian0 · 2025-07-01T05:53:06Z

[Feature]Moe alltoallv communication optimization for unquantized RL training sence & alltoallv support dpo

Introduction

This PR introduces two key optimizations for MoE model performance:

Efficient Token Dispatcher:
- Implements an optimized alltoallv_seq token dispatcher (adopted from NVIDIA Megatron and Ascend MindSpeed)
- Significantly more efficient than current alltoall implementation when using token_permute/unpermute fusion
- Enable with: VLLM_ASCEND_ENABLE_MOE_ALL2ALL_SEQ=1
DBO Support for alltoallv_seq:
- Builds upon the alltoallv_seq dispatcher to support DBO (Dual Batch Overlap)
- Enables overlapping of alltoallv communication during the prefilling stage
- Enable with both:
  - VLLM_ASCEND_ENABLE_MOE_ALL2ALL_SEQ=1
  - VLLM_ASCEND_ENABLE_DBO=1

Performance Improvements

Testing on Qwen3-30B-A3B shows nearly 2x throughput improvement compared to the original alltoall implementation.

…training sence & alltoallv support dpo Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

github-actions · 2025-07-03T01:29:02Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

github-actions · 2025-07-08T10:50:24Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

Signed-off-by: duyangkai <duyangkai@huawei.com>

ganyi1996ppo · 2025-07-11T08:30:59Z

vllm_ascend/ops/moe_dispatcher/token_dispatcher.py

+        # at different points based on MoE settings as late as possible.
+        # Valid sync points are "before_permutation_1", "before_ep_alltoall",
+        # "before_finish", and "no_sync".
+        self.cuda_sync_point = "no_sync"


Why use this naming, seems little bit unsuitable in vllm-ascend

cuda_sync_point is already renamed to device_sync_point

Signed-off-by: duyangkai <duyangkai@huawei.com>

… into v0.9.1-dev # Conflicts: # tests/ut/test_distributed_tensor_parallel.py # tests/ut/test_moe_util.py # tests/ut/test_token_dispatcher.py # vllm_ascend/ascend_forward_context.py # vllm_ascend/envs.py # vllm_ascend/models/moe_block.py # vllm_ascend/models/qwen3_dbo.py # vllm_ascend/ops/fused_moe.py # vllm_ascend/ops/moe_dispatcher/token_dispatcher.py

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

github-actions · 2025-07-12T11:20:58Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

Signed-off-by: duyangkai <duyangkai@huawei.com>

weijinqian_v1 added 12 commits July 1, 2025 09:51

[Feature]Moe alltoallv communication optimization for unquantized RL …

7ff288e

…training sence & alltoallv support dpo Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Feature]Moe alltoallv communication optimization for unquantized RL …

6d7b5b4

…training sence & alltoallv support dpo Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Feature]Moe alltoallv communication optimization for unquantized RL …

6a8e1a9

…training sence & alltoallv support dpo Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Feature]Moe alltoallv communication optimization for unquantized RL …

4805c5a

…training sence & alltoallv support dpo Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Feature]Moe alltoallv communication optimization for unquantized RL …

d68ce07

…training sence & alltoallv support dpo Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Feature]Moe alltoallv communication optimization for unquantized RL …

0aff693

…training sence & alltoallv support dpo Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Feature]Moe alltoallv communication optimization for unquantized RL …

f6ab19e

…training sence & alltoallv support dpo Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Feature]Moe alltoallv communication optimization for unquantized RL …

a94c094

…training sence & alltoallv support dpo Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Feature]Moe alltoallv communication optimization for unquantized RL …

91570d8

…training sence & alltoallv support dpo Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Feature]Moe alltoallv communication optimization for unquantized RL …

e7c0d2d

…training sence & alltoallv support dpo Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Feature]Moe alltoallv communication optimization for unquantized RL …

47439e8

…training sence & alltoallv support dpo Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Feature]Moe alltoallv communication optimization for unquantized RL …

cf3f1c8

…training sence & alltoallv support dpo Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

github-actions bot added module:tests module:ops module:core labels Jul 1, 2025

weijinqian_v1 added 3 commits July 1, 2025 14:03

[Feature]Moe alltoallv communication optimization for unquantized RL …

a4126f3

…training sence & alltoallv support dpo Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Feature]Moe alltoallv communication optimization for unquantized RL …

807aaf0

…training sence & alltoallv support dpo Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

[Feature]Moe alltoallv communication optimization for unquantized RL …

6f6efc1

…training sence & alltoallv support dpo Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

github-actions bot added the merge-conflicts label Jul 3, 2025

handle conflict

305a0eb

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

github-actions bot added merge-conflicts and removed merge-conflicts labels Jul 8, 2025

github-actions bot removed the merge-conflicts label Jul 8, 2025

weijinqian_v1 and others added 5 commits July 9, 2025 16:25

add st:qwen3

5411ed6

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

add st for moe token dispatcher

3f88769

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

fix bug

854c149

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

add st for moe token dispatcher

d0bd006

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

add moe_block: AscendSparseMoeBlock

49e9771

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

harygo22 added 2 commits July 11, 2025 14:34

fix clean code

f980ad0

Signed-off-by: duyangkai <duyangkai@huawei.com>

typo

969ee25

Signed-off-by: duyangkai <duyangkai@huawei.com>

ganyi1996ppo reviewed Jul 11, 2025

View reviewed changes

harygo22 and others added 5 commits July 11, 2025 17:32

renaming cuda sync point

a70be9a

Signed-off-by: duyangkai <duyangkai@huawei.com>

handle code clean

402f889

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

handle code clean

141407d

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

handle code clean

b1d7305

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

github-actions bot added the merge-conflicts label Jul 12, 2025

weijinqian added 11 commits July 12, 2025 23:43

handle clean code

62cebe1

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

handle clean code

80b1d0d

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

handle clean code

e87df11

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

handle clean code

267db60

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

handle clean code

b0572c8

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

handle clean code

e4f1050

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

handle clean code

eaed83d

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

handle clean code

b97baf4

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

handle clean code

d232d49

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

handle clean code

1e435e6

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

handle clean code

8effdd0

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

weijinqian0 force-pushed the v0.9.1-dev branch from 2ae837f to 8effdd0 Compare July 12, 2025 15:44

weijinqian_v1 added 2 commits July 12, 2025 23:44

handle code clean

a8136b7

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

handle code conflict

d29deae

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

github-actions bot removed the merge-conflicts label Jul 12, 2025

weijinqian_v1 and others added 2 commits July 12, 2025 23:52

handle code clean

94b7b5b

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

fix header

c2f670d

Signed-off-by: duyangkai <duyangkai@huawei.com>

ganyi1996ppo approved these changes Jul 14, 2025

View reviewed changes

ganyi1996ppo merged commit 63944db into vllm-project:v0.9.1-dev Jul 14, 2025
16 checks passed

wangxiyuan added the no-main label Jul 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[0.9.1][Feature]Moe alltoallv communication optimization for unquantized RL training sence & alltoallv support dpo #1547

[0.9.1][Feature]Moe alltoallv communication optimization for unquantized RL training sence & alltoallv support dpo #1547

Uh oh!

weijinqian0 commented Jul 1, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 3, 2025

Uh oh!

github-actions bot commented Jul 8, 2025

Uh oh!

ganyi1996ppo Jul 11, 2025

Uh oh!

harygo22 Jul 11, 2025

Uh oh!

github-actions bot commented Jul 12, 2025

Uh oh!

Uh oh!

Uh oh!

[0.9.1][Feature]Moe alltoallv communication optimization for unquantized RL training sence & alltoallv support dpo #1547

[0.9.1][Feature]Moe alltoallv communication optimization for unquantized RL training sence & alltoallv support dpo #1547

Uh oh!

Conversation

weijinqian0 commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Introduction

Performance Improvements

Uh oh!

github-actions bot commented Jul 3, 2025

Uh oh!

github-actions bot commented Jul 8, 2025

Uh oh!

ganyi1996ppo Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

harygo22 Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jul 12, 2025

Uh oh!

Uh oh!

Uh oh!

weijinqian0 commented Jul 1, 2025 •

edited

Loading